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We consider the problem of estimating the value i(<p) of a linear functional, 
where the structural function ip models a nonparametric relationship in pres- 
ence of instrumental variables. We propose a plug-in estimator which is based 
on a dimension reduction technique and additional thresholding. It is shown 
that this estimator is consistent and can attain the minimax optimal rate of 
convergence under additional regularity conditions. This, however, requires an 
optimal choice of the dimension parameter m depending on certain characteris- 
tics of the structural function ip and the joint distribution of the regressor and 
the instrument, which are unknown in practice. We propose a fully data driven 
choice of m which combines model selection and Lepski's method. We show 
that the adaptive estimator attains the optimal rate of convergence up to a log- 
arithmic factor. The theory in this paper is illustrated by considering classical 
smoothness assumptions and we discuss examples such as pointwise estimation 
or estimation of averages of the structural function (p. 
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1. Introduction 

We consider estimation of the value of a linear functional of the structural function <p in 
a nonparametric instrumental regression model. The structural function characterizes the 
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dependency of a response Y on the variation of an explanatory random variable Z by 



for some error term U. In other words, the structural function equals not the conditional 
mean function of Y given Z. In this model, however, a sample from (Y, Z, W) is available, 
where W is a random variable, an instrument, such that 



Given some a-priori knowledge on the unknown structural function (p, captured by a function 
class J 7 , its estimation has been intensively discussed in the literature. In contrast, in 
this paper we are interested in estimating the value £(<p) of a continuous linear functional 
£ : J- — > M. Important examples discussed in this paper are the weighted average derivative 
or the point evaluation functional which are both continuous under appropriate conditions 
on J- . We establish a lower bound of the maximal mean squared error for estimating £(<p) 
over a wide range of classes T and functionals £. As a step towards adaptive estimation, 
we propose in this paper a plug-in estimator of £{ip) which is consistent and minimax 
optimal. This estimator is based on a linear Galerkin approach which involves the choice 
of a dimension parameter. We present a method for choosing this parameter in a data 
driven way combining model selection and Lepski's method. Moreover, it is shown that the 
adaptive estimator can attain the minimax optimal rate of convergence within a logarithmic 
factor. 

Model (l.la-l.lb) has been introduced first by Florens [2003] and Newey and Powell [2003], 
while its identification has been studied e.g. in Carrasco et al. [2006], Darolles et al. [2002] 
and Florens et al. [2011]. It is interesting to note that recent applications and extensions 
of this approach include nonparametric tests of exogeneity (Blundell and Horowitz [2007]), 
quantile regression models (Horowitz and Lee [2007]), or semiparametric modeling (Florens 
et al. [2009]) to name but a few. For example, Ai and Chen [2003], Blundell et al. [2007], 
Chen and Reifi [2011] or Newey and Powell [2003] consider sieve minimum distance esti- 
mators of (p, while Darolles et al. [2002], Hall and Horowitz [2005], Gagliardini and Scaillet 
[2006] or Florens et al. [2011] study penalized least squares estimators. A linear Galerkin 
approach to construct an estimator of (p coming from the inverse problem community (c.f. 
Efromovich and Koltchinskii [2001] or Hoffmann and ReiB [2008]) has been proposed by 
Johannes and Schwarz [2010]. But estimating the structural function p as a whole involves 
the inversion of the conditional expectation operator of Z given W and generally leads to an 
ill-posed inverse problem (c.f. Newey and Powell [2003] or Florens [2003]). This essentially 
implies that all proposed estimators have under reasonable assumptions very poor rates of 
convergence. In contrast, it might be possible to estimate certain local features of (p, such 
as the value of certain linear functionals at the usual parametric rate of convergence. 
The nonparametric estimation of the value of a linear functional from Gaussian white noise 
observations is a subject of considerable literature (c.f. Speckman [1979], Li [1982] or 
Ibragimov and Has'minskii [1984] in case of direct observations, while in case of indirect 
observations we refer to Donoho and Low [1992], Donoho [1994] or Goldenshluger and 
Pereverzev [2000]). However, nonparametric instrumental regression is in general not a 
Gaussian white noise model. On the other hand, in the former setting the parametric 
estimation of linear functionals has been addressed in recent years in the econometrics 
literature. To be more precise, under restrictive conditions on the linear functional £ and 
the joint distribution of (Z, W) it is shown in Ai and Chen [2007], Santos [2011], and Severini 



Y = (p(Z) + U with E[U\Z]^0 




E[U\W] = 0. 
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and Tripathi [2010] that it is possible to construct n 1//2 -consistent estimators of £{<p). In this 
situation, efficiency bounds are derived by Ai and Chen [2007] and, when (p is not necessarily 
identified, by Severini and Tripathi [2010]. We show below, however, that n 1//2 -consistency 
is not possible for a wide range of linear functionals £ and joint distributions of (Z, W). 
In this paper we establish a minimax theory for the nonparametric estimation of the value 
of a linear functional £((p) of the structural function ip. For this purpose we consider a 
plug-in estimator £ m := £((p m ) of £((p), where the estimator (p m is proposed by Johannes 
and Schwarz [2010] and the integer m denotes a dimension to be chosen appropriately. The 
accuracy of £ m is measured by its maximal mean squared error uniformly over the classes 
T and V, where V captures conditions on the unknown joint distribution Puzw of the 
random vector (U,Z,W), i.e., Pjjzw £ "P- The class J- reflects prior information on the 
structural function ip, e.g., its level of smoothness, and will be constructed flexible enough 
to characterize, in particular, differentiable or analytic functions. On the other hand, the 
condition Pjjzw £ T specifies amongst others some mapping properties of the conditional 
expectation operator of Z given W implying a certain decay of its singular values. The 
construction of V allows us to discuss both a polynomial and an exponential decay of those 
singular values. Considering the maximal mean squared error over T and V we derive a 
lower bound for estimating £(<p). Given an optimal choice m* of the dimension we show 
that the lower bound is attained by £ m * i up to a constant C > 0, i.e., 

sup supE|? m * -4(V?)| 2 < Cinf sup supE \£ - £ h (<p)\ 2 

where the infimum on the right hand side runs over all possible estimators £. Thereby, 
the estimator £ m * i is minimax optimal even though the optimal choice m* depends on the 
classes T and V, which are unknown in practice. 

The main issue addressed in this paper is the construction of a data driven selection method 
for the dimension parameter which adapts to the unknown classes T and V . When esti- 
mating the structural function ip as a whole Loubes and Marteau [2009] and Johannes and 
Schwarz [2010] propose adaptive estimators under the condition that the eigenfunctions of 
the unknown conditional expectation operator are a-priori given. In contrast our method 
does not involve this a-priori knowledge and moreover, allows for both a polynomial and an 
exponential decay of the associated singular values. The methodology combines a model 
selection approach (cf. Barron et al. [1999] and its detailed discussion in Massart [2007]) 
and Lepski's method (cf. Lepski [1990]). It is inspired by the recent work of Goldenschluger 
and Lepski [2010]. To be more precise, the adaptive choice rh is defined as the minimizer 
of a random penalized contrast criterion 1 , i.e., 

m := arg min + peh m | (1.2a) 

with random integer M n and random penalty sequence pen 
below and the sequence of contrast ^ := (^ , TO )m>i given by 

$ m := max_ j \£ m > - £ m \ 2 - pen m , i. 

With this adaptive choice fh at hand the estimator £fn is shown to be minimax optimal 
within a logarithmic factor over a wide range of classes J- and V . 

For a sequence (a m ) m ^i having a minimal value in A C N set arg min{a m } := min{m : a m a m /Vm' € A}. 



(pen m ) m ^i to be defined 
(1.2b) 
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The paper is organized as follows. In Section 2 we introduce our basic model assumptions 
and derive a lower bound for estimating the value of a linear functional in nonparametric 
instrumental regression. In Section 3 we show consistency of the proposed estimator first 
and second that it attains the lower bound up to a constant. We illustrate the general 
results by considering classical smoothness assumptions. The applicability of these results 
is demonstrated by various examples such as the estimation of the structural function at a 
point, of its average or of its weighted average derivative. Finally, in section 4 we construct 
the random upper bound M n and the random penalty sequence pen used in (1.2a-1.2b) to 
define the data driven selection procedure for the dimension parameter m. The proposed 
adaptive estimator is shown to attain the lower bound within a logarithmic factor. All 
proofs can be found in the appendix. 

2. Complexity of functional estimation: a lower bound. 

2.1. Notations and basic model assumptions. 

The nonparametric instrumental regression model (l.la-l.lb) leads to a Fredholm equation 
of the first kind. To be more precise, let us introduce the conditional expectation operator 
T(f) := E[0(Z)|W] mapping L% = {</) : E[</> 2 (Z)] < oo} to L 2 W = : E^ 2 {W)} < 
oo} (which are endowed with the usual inner products {■■,■) z and {-,-} w , respectively). 
Consequently, model (l.la-l.lb) can be written as 

9 = Tip (2.1) 

where the function g := W(Y\W\ belongs to L 2 ^. In what follows we always assume that 
there exists a unique solution <p £ L 2 Z of equation (2.1), i.e., g belongs to the range of T, and 
that the null space of T is trivial (c.f. Engl et al. [2000] or Carrasco et al. [2006] in the special 
case of nonparametric instrumental regression). Estimation of the structural function <p is 
thus linked with the inversion of the operator T. Moreover, we suppose throughout the 
paper that T is compact which is under fairly mild assumptions satisfied (c.f. Carrasco 
et al. [2006]). Consequently, a continuous generalized inverse of T does not exist as long as 
the range of the operator T is an infinite dimensional subspace of L^. This corresponds to 
the setup of ill-posed inverse problems. 

In this section we show that the obtainable accuracy of any estimator of the value l((p) 
of a linear functional can be essentially determined by regularity conditions imposed on 
the structural function (p and the conditional expectation operator T. In this paper these 
conditions are characterized by different weighted norms in l? z with respect to a pre-specified 
orthonormal basis in L z , which we formalize now. Given a positive sequence of 

weights w := (wj)j^i we define the weighted norm ||</>||^ := Ylj^i w j\{ ( t ) i £j)z\ 2 , 4> £ L 2 Z , the 
completion F w of L 2 Z with respect to \\-\\ w and the ellipsoid F r w :={(/)£ T w : ||0||^ ^ r} 
with radius r > 0. We shall stress that the basis {ej}j^i does not necessarily correspond 
to the eigenfunctions of T. In the following we write a n < b n when there exists a generic 
constant C > such that a n ^ C b n for sufficiently large n G N and a n ~ b n when a n < b n 
and b n < a n simultaneously. 

Minimal regularity conditions. Given a nondecreasing sequence of weights 7 := (Tj)j>i, 
we suppose, here and subsequently, that the structural function tp belongs to the ellipsoid 
for some p > 0. The ellipsoid J 7 ^ captures all the prior information (such as smoothness) 
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about the unknown structural function (p. Observe that the dual space of can be 
identified with J 7 !/^ where I/7 := (7^ )j>i (cf. Krein and Petunin [1966]). To be more 
precise, for all <j) G the value (h,4>)z is well defined for all h G ^i/ 7 and by Riesz's 
Theorem there exists a unique h G -F\H such that £((/>) = {h,<f)}z ='■ £h{4>)- I n certain 
applications one might not only be interested in the performance of an estimation procedure 
of ihip) for a given representer h, but also for h varying over the ellipsoid J-^ with radius 
r > for a nonnegative sequence ui := ((^j)j^i satisfying inf^ijcjj^j} > 0. Obviously, T w 
is a subset of J : i/ 1 - 

Furthermore, as usual in the context of ill-posed inverse problems, we specify some mapping 
properties of the operator under consideration. Denote by T the set of all compact operators 
mapping L 2 Z into L 2 ^. Given a sequence of weights v := (vj)j^i and d ^ 1 we define the 
subset of T by 

77:={TGT: \\<f>\\l/d^\\T<t>f w ^dU\\l, G L%}. (2.2) 

Notice first that any operator T G T£ is injective if the sequence v is strictly positive. 
Furthermore, for all T G TT it follows that Vj/d ^ UTe^H^ ^ dvj for all j ^ 1 and if 
(sj)j^i denotes the ordered sequence of singular values of T then it holds Vj/d ^ s 2 ^ efajj. 
In other words, the sequence v specifies the decay of the singular values of T. In what follows, 
all the results are derived under regularity conditions on the structural function (p and the 
conditional expectation operator T described through the sequence 7 and v, respectively. 
We provide illustrations of these conditions below by assuming a "regular decay" of these 
sequences. The next assumption summarizes our minimal regularity conditions on these 
sequences. 

Assumption 1. Let 7 := (jj)j^i, oj := (uj)j^i and v := (vj)j^i be strictly positive se- 
quences of weights with^i = u>i = v± = 1 such thatj is nondecreasing with |J| 3 77 = o(l) as 
j —7- 00, u) satisfies inf^i{wj7j} > and v is a nonincreasing sequence with lmi ? _ s . 00 Vj = 0. 

Remark 2.1. We illustrate Assumption 1 for typical choices of 7 and v usually studied 
in the literature (cf. Hall and Horowitz [2005], Chen and ReiB [2011] or Johannes et al. 
[2011]), that is, 

(pp) 7j ~ |j| 2p with p > 3/2, vj ~ |j|" 2a , a > 0, and 

(i) [h]] ~\j\- 2s , s>l/2-pov 

(ii) LOj ~ |j| 2s , s > -p. 

(pe) 7j ~ |j| 2p , P > 3/2 and Vj ~ exp(— |j'| 2a ), a > 0, and 
0) [h?j~\j\- 2s , s>l/2-por 
(ii) ujj ~ |j| 2s , s > -p. 

(ep) 7j ~ exp(|j| 2p ), p > and Vj ~ |j|~ 2a ,.a > 0, and 

(i) [h] 2 ~ \j\' 2s , setoi 

(ii) ojj ~ |j| 2s , s G R. 

Note that condition |,7'| 3 7~ = o(l) as j — >■ 00 is automatically satisfied in case of (ep). 
In the other two cases this condition states under classical smoothness assumptions that, 
roughly speaking, the structural function ip has to be differentiable. □ 
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In order to formulate below the lower as well as the upper bound let us define for x 1 



m x := arg mm < 

meN 



max ■ 



mm 



a* := max <j — — , x 1 [>. (2.3) 

7m* 



We shall see that the minimax optimal rate is determined by the sequence 1Z h := (72^) n ^i, 
in case of a fixed representer h, and TZ W := (72^) n ^i in case of a representer varying over 
the class J 7 ^. These sequences are given for all x ^ 1 by 



72^ :=maxi a* V^-, V — > and 72" := a* max (— — 1. (2.4) 



In case of adaptive estimation the rate of convergence is given by 72 adapt := (^(i+iogn)- 1 
and ^adapt := (^n(i+io g n)-i)">i' respectively. For ease of notation let m° n := m* (1+Jogn) _i 
and a° := aQ 1+ i ogn )-i- The bounds established below need the following additional as- 
sumption, which is satisfied in all cases used to illustrate the results. 
Assumption 2. Let 7 and v be sequences such that 

< k : = inf j (a*)- 1 min { n' 1 11 < 1. (2.5) 
I 1 7m* J J 



2.2. Lower bounds. 

The results derived below involve assumptions on the conditional moments of the random 
variables U given W, captured by lA a , which contains all conditional distributions of U given 
W, denoted by P u]w , satisfying E[U\W) = and E[[/ 4 |W] < ct 4 for some a > 0. The next 
assertion gives a lower bound for the mean squared error of any estimator when estimating 
the value (■hi'-f) of a linear functional with given representer h and structural function cp in 
the function class 

Theorem 2.1. Assume an iid. n-sample of (Y,Z,W) from the model (1.1a— 1.1b). Let 7 
and v be sequences satisfying Assumptions 1 and 2. Suppose that sup J -^ 1 E[e|(Z)|W] ^ r/ 4 , 

77 ^ 1, and a 4 (y/3 + Apr] 2 Ylj^i ^J l Y ' • Then for all n ^ 1 we have 
inf inf sup sup E \£ - £ h (f)\ 2 > ? ™in ( — , , p | 72* 

where the first infimum runs over all possible estimators £. 

Remark 2.2. In the proof of the lower bound we consider a test problem based on two hypo- 
thetical structural functions. For each test function the condition ex 4 ^ [\^3+4pi] 2 Ylj^i 7j _1 ) 
ensures a certain complexity of the hypothetical model in a sense that it allows for Gaus- 
sian residuals. This specific case is only needed to simplify the calculation of the distance 
between distributions corresponding to different structural functions. A similar assumption 
has been used by Chen and Reifi [2011] in order to derive a lower bound for the estimation 
of the structural function ip itself. In particular, the authors show that in opposite to the 
present work an one-dimensional subproblem is not sufficient to describe the full difficulty 
in estimating ip. 
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On the other hand, below we derive an upper bound assuming that Pmw belongs to U a and 
that the joint distribution of (Z, W) fulfills in addition Assumption 3. Obviously in this 
situation Theorem 2.1 provides a lower bound for any estimator as long as a is sufficiently 
large. □ 

Remark 2.3. The regularity conditions imposed on the structural function (p and the con- 
ditional expectation operator T involve only the basis {ej}j^i in L 2 Z . Therefore, the lower 
bound derived in Theorem 2.1 does not capture the influence of the basis {fi}i^i in L 2 ^ 
used below to construct an estimator of the value ihi'-f)- In other words, this estimator 
attains the lower bound only if {fi}i^± is chosen appropriately. □ 

Remark 2.4. The rate TZ h of the lower bound is never faster than the parametric rate, that 
is, TZ^ ^ n -1 . Moreover, it is easily seen that the lower bound rate is parametric if and only 
if X^iMj^J" < 00 • This condition does not involve the sequence 7 and hence, attaining 
a parametric rate is independent of the regularity conditions we are willing to impose on 
the structural function. □ 

The following assertion gives a lower bound over the ellipsoid F^ of representer. Consider 
the function h* := tu;-* 1 ^ 2 ^* with j* := arg max^^^ {(ujjVj)^ 1 } which obviously belongs 
to J-^j. Corollary 2.2 follows then by calculating the value of the lower bound in Theorem 
2.1 for the specific representer h* and, hence we omit its proof. 

Corollary 2.2. Let the assumptions of Theorem 2.1 be satisfied. Then for all n 1 we 
have 

inf inf sup sup E \£ - 4(^)1 2 > ^ min ( ±- , p) K% 



where the first infimum runs over all possible estimators £. 

Remark 2.5. If the lower bound given in Corollary 2.2 tends to zero then (^jlj)j^i is a 
divergent sequence. In other words, without any additional restriction on ip, that is, 7 = 1, 
consistency of an estimator of (■hi'-f) uniformly over all ip £ F^ and all h £ F^ is only 
possible under restrictions on the representer h in the sense that lv has to be a divergent 
sequence. This obviously reflects the ill-posedness of the underlying inverse problem. □ 



3. Minimax optimal estimation. 

3.1. Estimation by dimension reduction and thresholding. 

In addition to the basis in L 2 Z used to establish the lower bound we consider now 

also a second basis {fi}i^i in L 2 ^. 

Matrix and operator notations. Given m ^ 1, 6 m and F m denote the subspace of L 2 Z and 
L 2 ^ spanned by the functions {ej} 1 j! =1 and {/j}^ respectively. E m and E^ (resp. F m and 
F^) denote the orthogonal projections on £ m (resp. F m ) and its orthogonal complement 
£^ (resp. F^), respectively. Given an operator K from L 2 Z to L 2 ^ we denote its inverse 
by K~ x and its adjoint by K* . If we restrict F m KE m to an operator from £ m to F m , then 
it can be represented by a matrix [K] m with entries [K]ij = (Kej,fi)w for 1 ^ j, I ^ m. 
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Its spectral norm is denoted by ||[if] m ||, its inverse by [JC]" 1 and its transposed by \K^ m . 
We write / for the identity operator and \7 V for the diagonal operator with singular value 
decomposition {vj, ej, fj}j^i- Respectively, given functions <j> 6 l? z and ip G L 2 ^ we define 
by [<j>]m and [ip]rn m-dimensional vectors with entries [4>}j = (<j>, ej)z and [ip]i = (ip, fi)w f° r 
1 ^ j, I ^ m. 

Consider the conditional expectation operator T associated with (Z,W). If [e(Z)]m, and 
denote random vectors with entries ej(Z) and fj(W), 1 ^ j ^ m, respectively, 
then it holds [T] m = E[f (W)]m[e(Z)] t m . Throughout the paper [T] m is assumed to be 
nonsingular for all m ^ 1, so that [T]" 1 always exists. Note that it is a nontrivial problem 
to determine when such an assumption holds (cf. Efromovich and Koltchinskii [2001] and 
references therein). 



Definition of the estimator. Let (li, Z\, W\), . . . , (Y n , Z n , W n ) be aniid. sample of (Y, Z, W). 
Since \T] m = E[f (W)]rn[e(Z)] m an d Idlm = we construct estimators by using 

their empirical counterparts, that is, 

1 n n 

[ f W : = - E^^kK^L and [gU := - ^ Y t [f(W l )] nk . 

1=1 8=1 

Then the estimator of the linear functional t-hitf) is defined for all m ^ 1 by 

m is nonsingular and HP^^H ^ y/ri, (3.1) 
0, otherwise. 

In fact, the estimator £ m is obtained from the linear functional £h(f) by replacing the 
unknown structural function if by an estimator proposed by Johannes and Schwarz [2010]. 

Remark 3.1. If Z is continuously distributed one might be also interested in estimating 
the value f z ip(z)h(z)dz where Z is the support of Z. Assume that this integral and also 
J z h(z)ej{z)dz for 1 ^ j ^ m are well defined. Then we can cover the problem of estimating 
f z <f(z)h(z)dz by simply replacing [h]m in the definition of £ m by a m-dimensional vector 
with entries f z h(z)ej(z)dz for 1 ^ j ^ m. Hence for J z f(z)h(z)dz the results below follow 
mutatis mutandis. □ 



Moment assumptions. Besides the link condition (2.2) for the conditional expectation 
operator T we need moment conditions on the basis, more specific, on the random variables 
&j(Z) and fi(W) for j, I ^ 1, which we summarize in the next assumption. 

Assumption 3. There exists rj ^ 1 such that the joint distribution of (Z,W) satisfies 

(i) Bup i6N E[ef (Z)|W] < n 2 and su PieN E[/* (W)\ < r? 4 ; 

(ii) S up j;leN E\e j (Z)f l (W)-E[e j (Z)f l (W)]\ k ^ V k k\,k = 3A,---- 

Note that condition (ii) is also known as Cramer's condition, which is sufficient to obtain 
an exponential bound for large deviations of the centered random variable ej(Z)fi(W) — 
E[ej(Z)fi(W)} (cf. Bosq [1998]). Moreover, any joint distribution of (Z, W) satisfies As- 
sumption 3 for sufficiently large 77 if the basis {ej}j^i and {fi}i^i are uniformly bounded, 
which holds, for example, for the trigonometric basis considered in Subsection 3.4. 
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3.2. Consistency. 

The next assertion summarizes sufficient conditions to ensure consistency of the estimator 
£ m introduced in (3.1). Let tp m G £ m with [p m ]rn = PlmM^W Obviously, up to the 
threshold, the estimator £ m is the empirical counterpart of Ihi^Pm)- In Proposition 3.1 
consistency of the estimator l m is only obtained under the condition 

llv ~~ fmW'y = o(l) as m — > oo (3-2) 

which does not hold true in general. Obviously (3.2) implies the convergence of £h(fm) to 
th(tp) as m tends to infinity for all h G -7~i/ 7 . 

Proposition 3.1. Assume an iid. n-sample of (Y, Z,W) from the model (l.la-l.lb) with 
Pu\w £ Ua an d joint distribution of (Z,W) fulfilling Assumption 3. Let the dimension 
parameter m n satisfy m~ l = o(l), m n = o{n), 

HW^^lrnill 2 = °( n )> and ^nllPlrnlll 2 = °( n ) OS 71 ^ OO. (3.3) 

If (3.2) holds true then E \£ mn — ^i(v)| 2 = °(1) as n ^ oo for all ip G and h € Fi/y 
Notice that condition (3.2) also involves the basis in Lyy. In what follows, we 

introduce an alternative but stronger condition to guarantee (3.2) which extends the link 
condition (2.2). We denote by T£ D for some D ^ d the subset of T£ given by 

Tl D := {T G T d v : sup \\[V v ]]l 2 [T)-}\\ 2 < I)}. (3.4) 



Remark 3.2. If T G 7^" and if in addition its singular value decomposition is given by 
{sj,ej, fj}j^i then for all m #s 1 the matrix [T] m is diagonalized with diagonal entries 
[T]jj = Sj, 1 ^ j ^ m. In this situation it is easily seen that sup mgN || [Vujm 2 ^]" 1 1| 2 ^ d 
and, hence T satisfies the extended link condition (3.4), that is, T G 77d- Furthermore, 

it holds = TJd f° r suitable D > 0, if T is a small perturbation of V^/ 2 or if T is 
strictly positive (c.f. Efromovich and Koltchinskii [2001] or Cardot and Johannes [2010], 
respectively). □ 

Remark 3.3. Once both basis {ej}j^\ and {fi}i^i are specified the extended link condition 
(3.4) restricts the class of joint distributions of (Z, W) such that (3.2) holds true. Moreover, 
under (3.4) the estimator (p m of <p proposed by Johannes and Schwarz [2010] can attain the 
minimax optimal rate. In this sense, given a joint distribution of (Z, W) a basis {fi}i^i 
satisfying condition (3.4) can be interpreted as optimal instruments (c.f. Newey [1990]). □ 

Remark 3.4. For each pre-specified basis {ej}j>i we can theoretically construct a basis 
{fi}i^i such that (3.4) is equivalent to the link condition (2.2). To be more precise, if 
T G TJ, which involves only the basis {e,}j^i, then the fundamental inequality of Heinz 
[1951] implies \\(T*T)-^ 2 ej\\% «S dvj 1 . Thereby, the function (^T)" 1 / 2 ej is an element 
of I? z and, hence fj := T(T*T)~ 1 / 2 ej, j ^ 1, belongs to Lyy. Then it is easily checked 
that {fi}i^i is a basis of the closure of the range of T which may be completed to a basis 
of L^. Obviously [T] m is symmetric and moreover, strictly positive since (Tej,fi)w = 
((T*T) l / 2 ej , ei) z for all j, I ^ 1. Thereby, we can apply Lemma A. 3 in Cardot and Johannes 
[2010] which gives TJ = T^ D for sufficiently large D. We are currently exploring the data 
driven choice of the basis {fi}i^i- □ 
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Under the extended link condition (3.4) the next assertion summarizes sufficient conditions 
to ensure consistency. 

COROLLARY 3.2. The conclusion of Proposition 3.1 still holds true without imposing con- 
dition (3.2), if the sequence v satisfies Assumption 1, the conditional expectation operator 
T belongs to T^ D , and (3.3) is substituted by 

m„ 

= o[n) and rr? n = 0(nv mn ) as n — >■ oo. (3-5) 

i=i 

3.3. An upper bound. 

The last assertions show that the estimator l m defined in (3.1) is consistent for all structural 
functions and representers belonging to T 1 and -7 r i/ 7 , respectively. The following theorem 
provides now an upper bound if ip belongs to an ellipsoid T^. In this situation the rate 1Z h 
of the lower bound given in Theorem 2.1 provides up to a constant also an upper bound 
of the estimator £ m * . Thus we have proved that the rate 7Z h is optimal and, hence £ m * is 
minimax optimal. 

Theorem 3.3. Assume an iid. n-sample of (Y,Z,W) from the model (l.la-l.lb) with 
joint distribution of (Z, W) fulfilling Assumption 3. Let Assumptions 1 and 2 be satisfied. 
Suppose that the dimension parameter m* n given by (2.3) satisfies 

(m;) 3 max|| log^|, (logm*)| = o(7 m .), as n -> oo, (3.6) 

then we have for all n ^ 1 

sup sup sup E \£ m * — 4 (y?)| 2 < CTl h n 

for a constant C > only depending on the classes T^, T^ D , the constants a, r] and the 
representer h. 

The following assertion states an upper bound uniformly over the class J-^ of representer. 
Observe that ||/i|| 2 / 7 ^ t and TZ% ^ ra* maxi^j^ m * {{ojjVj)~ 1 } = tIZ^ for all h G T^. 
Employing these estimates the proof of the next result follows line by line the proof of 
Theorem 3.3 and is thus omitted. 

Corollary 3.4. Let the assumptions of Theorem 3.3 be satisfied where we substitute con- 
dition (3.6) by (m*) 3 max{| log 72.^ |, (logm*)} = o(7 m *) as n — > oo. Then we have 

sup sup sup E\£ m * n -£ h ((p)\ 2 ^ CTZn 
T ^ T d, D p u\w&i a her* 

for a constant C > only depending on the classes J-^, TJ D and the constants a, rj. 

3.4. Illustration by classical smoothness assumptions. 

Let us illustrate our general results by considering classical smoothness assumptions. To 
simplify the presentation we follow Hall and Horowitz [2005] , and suppose that the marginal 
distribution of the scalar regressor Z and the scalar instrument W are uniformly distributed 
on the interval [0, 1]. All the results below can be easily extended to the multivariate case. 
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In the univariate case, however, both Hilbert spaces l? z and L 2 ^ equal L 2 [0, 1]. Moreover, 
as a basis in L 2 [0, 1] we choose the trigonometric basis given by 

ei := 1, e 2 j(t) := >/2 cos(27rji), e 2j+ i(t) := V2sin(2irjt), t G [0, 1], j G N. 

In this subsection also the second basis is given by the trigonometric basis. In this 

situation, the moment conditions formalized in Assumption 3 are automatically fulfilled. 
Recall the typical choices of the sequences 7, uj, and v introduced in Remark 2.1. If 
7j ~ li| 2p > P > 0, as in case (pp) and (pe), then coincides with the Sobolev space of 
p-times differential periodic functions (c.f. Neubauer [1988a,b]). In case of (ep) it is well 
known that J7 contains only analytic functions if p > l(c.f. Kawata [1972]). Furthermore, 
we consider two special cases describing a "regular decay" of the unknown singular values 
of T. In case of (pp) and (ep) we consider a polynomial decay of the sequence v. Easy 
calculus shows that any operator T satisfying the link condition (2.2) acts like integrating 
(a)-times and hence is called finitely smoothing (c.f. Natterer [1984]). In case of (pe) 
we consider an exponential decay of v and it can easily be seen that T G TJ implies 
72(T) C C°°[0,1], therefore the operator T is called infinitely smoothing (c.f. Mair [1994]). 
In the next assertion we present the order of sequences lZ h and 72 w which were shown to be 
minimax-optimal. 

Proposition 3.5. Assume an iid. n-sample of (Y, Z,W) from the model (l.la-l.lb) with 
T G T^d and Pjj\w £ U c . Then for the example configurations of Remark 2.1 we obtain 

(pp) m* ~ n l ^ 2p+2a ^ and 

r n -(2p+2s~l)/(2p+2a) ^ if S -a<l/2, 

(1) 72* ~ I rr 1 log n, if s- a = 1/2, 

[ n -1 , otherwise, 

(ii) 72£ ~ max(n-(P +s )/(P+ a ),n- 1 ). 

(pe) m* n ~ log(n(logn)~ p / a ) 1 /( 2a ) and 
(i) K h n ~ (\ ogn )-^P+2s-l)/(2a) ; 

(ii) 72£ ~ (logn)-(P+ s )/ a . 
(ep) m* ~ log(n(logn)~ a / p ) 1 /( 2p ) and 

ft) ~ S 7i _1 log(logn), 

(ii) 72£ ~ max^-^logn)^" 5 )^,/!- 1 ). 

Remark 3.5. As we see from Proposition 3.5, if the value of a increases the obtainable 
optimal rate of convergence decreases. Therefore, the parameter o is often called degree of 
ill-posedness (c.f. Natterer [1984]). On the other hand, an increasing of the value p or s 
leads to a faster optimal rate. Moreover, in the cases (pp) and (ep) the parametric rate n _1 
is obtained independent of the smoothness assumption imposed on the structural function ip 
(however, p ^ 3/2 is needed) if the representer is smoother than the degree of ill-posedness 
of T, i.e., (i) s ^ a — 1/2 and (ii) s ^ a. Moreover, it is easily seen that if [h]j ~ exp(— \j\ s ) 
or ujj ~ exp(|j| 2s ), s > 0, then the minimax convergence rates are always parametric for 
any polynomial sequences 7 and v. □ 



if s - a < 1/2, 
if s — a = 1/2, 
otherwise, 
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Example 3.1. Suppose we are interested in estimating the value <~p(to) of the structural 
function <p evaluated at a point to G [0,1]. Consider the representer given by ht = 
Yl'jLi e j{to) e j- Let ip G T~{. Since Y^fyi ij 1 < 00 ( c ^- Assumption 1) it holds h G ^i/ 7 and 
hence the point evaluation functional in to G [0, 1], i.e., £ht {p) = ^(^o), is well defined. In 
this case, the estimator t m introduced in (3.1) writes for all m ^ 1 as 

S m (t ) ■= { \- e fa%fflrn^™,i if fi)w, is nonsingular and \\[f]^\\ y/n, 
"' ' \ 0, otherwise 

where <p m is an estimator proposed by Johannes and Schwarz [2010]. Let p 3/2 and 
a > 0. Then the estimator <p m ^(to) attains within a constant the minimax optimal rate of 
convergence lZ ht o . Applying Proposition 3.5 gives 

(pp) 72> ~ „-(2p-1)/(2JH-2«), 

(ep) 72> ~ (logn)-^" 1 )/^), 

(ep) 72^° -n-^logn)^ 1 )/^). □ 

Example 3.2. We want to estimate the average value of the structural function ip over 
a certain interval [0,6] with < b < 1. The linear functional of interest is given by 
ihisp) = Jo vtydt with representer /i := l [0,6] - ^ s Fourier coefficients are given by [h]i = 6, 
[h]zj = (V^irj)' 1 sin(27rj6), [h]2j+i = -(v^vrj) -1 cos(27r/6) for j ^ 1 and, hence [/i]| ~ j -2 . 
Again we assume that p ^ 3/2 and a > 0. Then the mean squared error of the estimator 
^m* = Jo $m^(t)dt is bounded up to a constant by the minimax rate of convergence lZ h . In 
the three cases the order of 7£„ is given by 

C n -(2 P +l)/(2 P +2a) ; if a > 1/2, 

(pp) ^ n-Mogn, if a = 1/2, 

[ n _1 , otherwise, 

(ep) K h n ~ (logn)-( 2 f +1 )/( 2a ), 



n-^logn)^ 1 )/^), if a > 1/2, 
(ep) 72* ~ <( n- 1 log(logn), if a = 1/2, 

n -1 , otherwise. 

As in the direct regression model where the average value of the regression function can 
be estimated with rate n _1 we obtain the parametric rate in the case of (pp) and (ep) if 
a< 1/2. □ 

Example 3.3. Consider estimation of the weighted average derivative of the structural 
function <p with weight function H, i.e., Jg 1 (p'(t)H(t)dt. This functional is useful not only 
for estimating scaled coefficients of an index model, but also to quantify the average slope of 
structural functions. Assume that the weight function H is continuously differentiable and 
vanishes at the boundary of the support of Z, i.e., H(0) = H(l) = 0. Integration by parts 
gives Jq 1 (p'(t)H(t)dt = — Jq 1 <p(t)h(t)dt = —lh{<p) with representer h given by the derivative 
of H. The weighted average derivative estimator £ m * = — J (p m *^(t)h(t)dt is minimax 
optimal. As an illustration consider the specific weig ht function H(t) = 1 - (2t - l) 2 with 
derivative h{t) = 4(1 — 2t) for ^ t ^ 1. It is easily seen that the Fourier coefficients of the 
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representer h are [h]\ = 0, [h]2j = 0, [ft-]2j+i = 4\/2(7rj) 1 for j ^ 1 and, thus [/ij^+i ~ 3 2 ■ 
Thus, for the particular choice of the weight function H the estimator £ m * i attains up to a 
constant the optimal rate lZ h , which was already specified in Example 3.2. □ 



4. Adaptive estimation 

In this section we derive an adaptive estimation procedure for the value of the linear function 
£h( ( p)- This procedure is based on the estimator tfa given in (3.1) with dimension parameter 
m selected as a minimizer of the data driven penalized contrast criterion (1.2a-1.2b). The 
selection criterion (1.2a— 1.2b) involves the random upper bound M n and the random penalty 
sequence pen which we introduce below. We show that the estimator £ff l attains the minimax 
rate of convergence within a logarithmic term. Moreover, we illustrate the cost due to 
adaption by considering classical smoothness assumptions. 

In an intermediate step we do not consider the estimation of unknown quantities in the 
penalty function. Let us therefore consider a deterministic upper bound M n and a deter- 
ministic penalty sequence pen := (pen m ) m ^i, which is nondecreasing. These quantities are 
constructed such that they can be easily estimated in a second step. As an adaptive choice 
fh of the dimension parameter m we propose the minimizer of a penalized contrast criterion, 
that is, 

fh := arg min + pen m } (4.1a) 

where the random sequence of contrast \I> := (&m)m^i is defined by 

* m := max \ \£ m > - £ m \ 2 - pen m , \. (4.1b) 

The fundamental idea to establish an appropriate upper bound for the risk of £ff l is given by 
the following reduction scheme. Let us denote mAm' := min(m, m'). Due to the definition 
of \I/ and m we deduce for all 1 ^ m ^ M n 

\£fh ~ ^hi}f)\ ^* — iffi/\ m \ + \£fhf\m ~ &m\ + \^m ~ @-hi}P)\ ^ 

< 3{^ m + pen^ +1^ + pen m +\£ m - 4(</>)| 2 } 

< 6{* w + pen m } + 3\£ m - £ h (ip)\ 2 

where the right hand side does not depend on the adaptive choice fh. Since the penalty 
sequence pen is nondecreasing we obtain 

* m < 6 max ( \£ m > - £ h (<p m ')\ 2 - ~ pen m , J +3 max \£ h (Pm ~ <Pm>)\ 2 - 

msgm'sgM y J r msgm'sgM„ 

Combing the last estimate with the previous reduction scheme yields for all 1 ^ m M n 
|4i -4(^)| 2 ^ 7 pen m +78 bias m +42 max (\£ m > - £ h (f m ')\ 2 - 77 pen m , J (4.2) 



where bias m := s\xp m i^ m \£h{f m ' — V 9 )! 2 - We will prove below that pen m + bias m is of the 
order ^ (1+logn) -i- Moreover, we 
the help of Bernstein's inequality. 



order 7^wi + i ogn )-i- Moreover, we will bound the right hand side term appropriately with 
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Let us now introduce the upper bound M n and sequence of penalty pen m used in the 
penalized contrast criterion (4.1a-4.1b). In the following, assume without loss of generality 
that [h]i ^ 0. 

Definition 4.1. For all n ^ 1 let a n := nl-i/log^+logn)^ + i ogn )-i an( i m% := max{l sC 
m ^ L nl ^ 4 J : max [h] 2 j ^ n [^]il then we define 

M n := min (2 sC m sC M% : m^lfT]" 1 !! 2 max [/i] 2 > a n \ - 1 

where we set M n := M% if the min runs over an empty set. Thus, M n takes values between 
1 and M%. Let <^ = 74(E[y 2 ] + maxi^ m '<g m llfT]" 1 ^]™!! 2 ), then we define 

pen m := 24^(1 + log n)n" x max H^L'^Ull 2 . (4.3) 

lsgm'^m — — 

To apply Bernstein's inequality we need another assumption regarding the error term U. 
This is captured by the set for some a > 0, which contains all conditional distributions 
P V \ W such that E[Z7|W] = 0, E[LT 2 |^] <; a 2 ; an d Cramer's condition hold, i.e., 

E[\U\ k \W] ^ a k k\, fc = 3, 4, 

Moreover, besides Assumption 3 we need the following Cramer condition which is in par- 
ticular satisfied if the basis {fi}i^i are uniformly bounded. 

Assumption 4. There exists n ^ 1 such that the distribution of W satisfies 

sup i)l£N E|/ i (WO.ft(W r ) - E[fj(W)MW)}\ k ^ V k k\, k = 3,4, . . . . 

We now present an upper bound for l^. As usual in the context of adaptive estimation of 
functionals we face a logarithmic loss due to the adaptation. 

Theorem 4.1. Assume an iid. n-sample of (Y,Z,W) from the model (l.la-l.lb) with 
E[Y 2 ] > 0. Let Assumptions 1~4 be satisfied. Suppose that (m°) 3 maxi^j^ m o [/j]^ = o(a n v m ^ ) 
as n —7- 00. Then we have for all n ^ 1 

sup sup sup E|4i - 4(</?)| 2 < CKt 1+l osn) -i 

T£Tl D P ulw £Ug°^t; 

for a constant C > only depending on the classes J-^, T^ D , the constants a, n and the 
representer h. 

Remark 4.1. In all examples studied below the condition (m°) 3 maxi^j^ m ° [h]j = o(a n v m ^) 
as n tends to infinity is satisfied if the structural function ip is sufficiently smooth. More 
precisely, in case of (pp) it suffices to assume 3 < 2p + 2min(0,s). On the other hand, in 
case of (pe) or (ep) this condition is automatically fulfilled. □ 

In the following definition we introduce empirical versions of the integer M n and the penalty 
sequence pen. Thereby, we complete the data driven penalized contrast criterion (1.2a— 
1.2b). This allows for a completely data driven selection method. For this purpose, we 
construct an estimator for <^ by replacing the unknown quantities by their empirical anal- 
ogon, that is, 

n 

C := 74(V 1 ]T Y 4 2 + max 0]-' [g]rn\\ 2 ) ■ 

\ * — * l^m'^m — / 

8=1 
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With the nondecreasing sequence (^)m^i at hand we only need to replace the matrix \T] 7 
by its empirical counterpart (cf. Subsection 3.1). 

Definition 4.2. Let a n and M% be as in Definition 4-1 then for all n ^ 1 define 



M n := min{2 ^ m < M% : 7Ti 3 || [T 1 ]" 1 1| 2 max [h] 2 - > a n X - 1 



where we set M n := M% if the min runs over an empty set. Thus, M n takes values between 
1 and M%. Then we introduce for all m ^ 1 an empirical analogon o/pen m by 



p5n m := 204^(1 + log n)n~ l max || [hf^T ]~}\\ 2 . (4.4) 
Before we establish the next upper bound we introduce 

M+ := min i 2 ^ m < M% : v^m 3 max [h] 2 - > ADa n \ - 1 (4.5) 

[ l<i<m J J 

where M+ := ikf^ if the min runs over an empty set. Thus, takes values between 1 and 
M%. As in the partial adaptive case we do not attain up to a constant the minimax rate 
of convergence lZ h . A logarithmic term must be paid for adaption as we see in the next 
assertion. 

Theorem 4.2. Let the assumptions of Theorem 4-1 be satisfied. Additionally suppose that 
(M+ + l) 2 logre = o(nv M + +1 ) as n — ^ oo and sup^i E \ej(Z)\ 20 ^ n 20 . Then for all n ^ 1 
we have 

SUp SUp SUp E \lfh ~ 4(^)| 2 < Cn n(l+\o K n)-^ 

for a constant C > only depending on the classes T^, T^ D , the constants a, n and the 
representer h. 

Remark 4.2. Note that below in all examples illustrating Theorem 4.2 the condition (M+ + 
l) 2 logn = o(nv M + +1 ) as n tends to infinity is automatically satisfied. □ 

As in the case of minimax optimal estimation we now present an upper bound uniformly 
over the class J-^ of representer. For this purpose define M% := max{l ^ m ^ L n ^ 4 J : 
maxisgjsgm^J 1 ) ^ n}. In the definition of the bounds M n , M+, and M~ (cf. Appendix 4) 
we replace M% and maxi<gj<g m [7j] 2 by and maxi<gj<g m <jjJ x , respectively. Consequently, 
by employing H^H 2 ^ ^ t and Tl^ ^ t~1Z^ for all h S the next result follows line by line 
the proof of Theorem 4.2 and hence its proof is omitted. 

Corollary 4.3. Under the conditions of Theorem 4-% uie have for all n ^ 1 

SUp SUp SUp Ellfn- £ h (ip)\ 2 ^CTZ^ (1+l s_! 

where the constant C > depends on the parameter spaces T^, T^ D , and the constants 
a, n. 
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(*) ^n(l+logn)-i ~ ' 



Illustration by classical smoothness assumptions. Let us illustrate the cost due to adap- 
tion by considering classical smoothness assumptions as discussed in Subsection 3.4. In 
Theorem 4.2 and Corollary 4.3, respectively, we have seen that the adaptive estimator £f^ 
attains within a constant the rates T^dapt ana ^ "^adapt- Let us now present the orders of 
these rates by considering the example configurations of Remark 2.1. The proof of the 
following result is omitted because of the analogy with the proof of Proposition 3.5. 

Proposition 4.4. Assume an iid. n-sample of (Y, Z,W) from the model (l.la-l.lb) with 
conditional expectation operator T G T^ D , error term U such that Pmyy £ an d 
K[Y 2 ] > 0. Then for the example configurations of Remark 2.1 we obtain 
(pp) if in addition 3 < 2p + 2min(s,0) that m° n ~ (n(l + logn)~ 1 ) 1 ^ 2p+2a ' ) and 

'(n- 1 (l + logn))( 2 P +2s - 1 )/( 2 f+ 2a ), ifs-a< 1/2 
rt _1 (l + logra) 2 , if s — a =1/2 

rt -1 (l + logn), if s — a > 1/2, 

(%%) ft£ (1+logn) -i ~max((n- 1 (l + logn))( p+s )/(P +a ),n- 1 (l+logn)). 

(pe) m° n ~log(n(l + logn)-( a+p )/ a ) 1/2a and 

i 

i(l+logn)~ 
j 

"n(l+logn)~ 

(epj m° ~ log (n(l + logn)-( a +P)/P) 1/2p and 

'n~ l (l + iogn)( 2a+2 P- 2s+1 )/( 2 P), ifs-a< 1/2 
rt _1 (l + log n) (log log ra), if s — a = 1/2 

rt -1 (l + logra), if s — a > 1/2, 

( u ) K(i+io g n)-i ~naax(n- 1 (logn)(°+P- s )/P n-^l + logn)). 

Let us revisit Examples 3.1 and 3.2. In the following, we apply the general theory to adaptive 
pointwise estimation and adaptive estimation of averages of the structural function tp. 

Example 4.1. Consider the point evaluation functional £h t i^f) = t f{^)i £ [0, 1], intro- 
duced in Example 3.1. In this case, the estimator if^ with dimension parameter fh selected 
as a minimizer of criterion (1.2a-1.2b) writes as 

finite 



^^i+iog^-^a+iog-)-^ 28 - 1 ^, 

N^14-Wn^~(l + l0g«)- (P+S)/a - 



(i) K h r,., w ~ < 

1 ' n(l+logn) 1 



[<to)\U T &}™, if [Th ^ nonsingular and || [T]^ || < ^ 
0, otherwise 



where <j9 m is an estimator proposed by Johannes and Schwarz [2010]. Then <pfh(to) attains 
within a constant the rate of convergence 7£j}° apr Applying Proposition 4.4 gives 

^(l + logn))^-^^ 2 ^, 



(pp) <l 


-log n) 




hlog n) 


(ep) <;i 


-log n) 



~ n 



~ (l + logn)-^- 1 )/^), 

~ n-^l + logn)^ 2 ^ 1 )/^). □ 
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Example 4.2. Consider the linear functional ih(f) = Jq <p(t)dt with representer h := l[o,f>] 
introduced in Example 3.2. The mean squared error of the estimator = J (pf^ii^dt is 
bounded up to a constant by T^dapr Applying Proposition 4.4 gives 

r (n- 1 (l + logn))( 2 P +1 )/( 2 P+ 2a ), if a > 1/2, 
(PP) ^(i+iogn)-! ~ \ n-^l+logn) 2 , if a = 1/2, 

^ n (1 + logn), otherwise, 



M ^n(l+lo g n)- 1 ~( 1 + l0grl ) 



-(2p+l)/(2a) 



n _1 (l + logn)( 2a + 2 P- 1 )/( 2 P), if a > 1/2, 
(ep) ^(i+bgn)-i ~ ^ ""Hi + log n) (log log n), if a = 1/2, 

n~ 1 ( 1 + log n) , otherwise. 



□ 



A. Appendix 

A.l. Proof of the lower bound given in Section 2. 

(\ 1 / 2 ?72* 
S Ka * n — r ) X! W^r 1 ^! with 

£ := min(l/(2<f), p). Since {^lJ l Vj)j^\ is nonincreasing and by using the definition of n 
given in (2.5) it follows that ip* and in particular cpg := 9(p* for 6 G { — 1,1} belong to 
T^. Let V be a Gaussian random variable with mean zero and variance one (V ~ A/"(0, 1)) 
which is independent of (Z, W). Consider Ug := [Ty?e](W) — (pg(Z)-\-V, then -FV e |iy belongs 
to for all (j 4 ^ (\/3 + 4p 7~ 1 r/ 2 ) 2 , which can be realized as follows. Obviously, 
we have Ep7 fl |W] = 0. Moreover, we have sxxpjE[e1j(Z)\W] < rf implies E[</7 4 (Z)|W] < 
P 2 {T,^ilj l fne^Z)\W} < P^HEmlJ 1 ) 2 and thus, |[T^](^)| 4 < E[^ 4 (Z)|^] < 
pVCE^iTj 7 " 1 ) 2 - Fr om the last two bounds we deduce E[U^\W] < 16E[(£ 4 (Z)|iy] + 
6 Yai(ipg(Z)\W) + 3 ^ (\/3 + 4p?7 2 Ylj^i Tj" 1 ) 2- Consequently, for each 6 iid. copies 
(Yi, Zi,Wi), 1 ^ z ^ n, of (Y,Z,W) with 1" := (fg(Z) + C/g form an n-sample of the 
model (l.la-l.lb) and we denote their joint distribution by Pg and by Kg the expectation 
with respect to Pg. In case of Pg the conditional distribution of Y given W is Gaussian 
with mean [T(/jg](W) and variance 1. The log-likelihood of Pi with respect to P-\ is given 
by 

,p n n 

^ i=l i=l 

Since T G 7J the Kullback-Leibler divergence satisfies KL(P X ,P„ X ) < Ei[log(dPi/dP_i)] = 
2ra||Ty>* ll 2 ^ ^ 2n<i|| 2 . It is well known that the Hellinger distance H(Pi, P-i) satisfies 
P 2 (Pi,P_i) < KL{P x ,P-i) and thus, employing again the definition of k we have 



F 2 (Pi^-l) < 2nd J>] 2 ^- = 2nd \ . _ x E ^ = 2c O < 2 ^ < 1. (A.l) 
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Consider the Hellinger affinity p(P\,P-i) = J y/ 'dP\dP-\ then for any estimator £ it holds 

*(/y^-) 1/2 + (/W-) 1/2 <-» 

Due to the identity p{P x ,P-\) = 1 - \H 2 {P 1 , P_i) combining (A.l) with (A.2) yields 

Ei|^-4(^i)| 2 + E-i|^-4(^-i)l 2 > i|4(v.)l 2 - ( A - 3 ) 

Obviously, ^(v 9 *) | 2 = C Ka n YlWj v j ■ F rom (A. 3) together with the last identity we 

3=1 

conclude for any possible estimator £ 

sup sup supE|2-4(¥0| 2 > sup Eo\e-£ h (<pW)\ 2 
teT£ d P^weUcr <p<=jrp ee{-i,i} 

> ^{e x ||- 4(^)| 2 + E_x |l- 4(<?-i)| 2 } 
/ i \ m " 

^ ' 7 = 1 



Consider now ip± := I — _-, ) V f/tl -'y - 1 e.;, which belongs to since /c ^ 1 and 

( ^ p. Moreover, since (jjVj)j^i is nonincreasing and by using the definition of k given 
in (2.5) we have 

2nd ]T [(pjjvj = 2nd C " _ 1 £ ^ < 2d(^^ T ^ 2d( ^ 1. 

Thereby, following line by line the proof of (A. 4) we obtain for any possible estimator £ 
sup sup sup E\£-£ h (<p)\ 2 > j|4(£*)| 2 = 7 min (^-,p) £ [fcg 7 ri. 



J>m* 



Combining, the last estimate and (A. 4) implies the result of the theorem, which completes 
the proof. □ 

A.2. Proofs of Section 3. 

We begin by defining and recalling notations to be used in the proofs of this section. For 
m ^ 1 recall y? m = Y^i[<Pm]jej with [ip m ]m = [T\^[g\m keeping in mind that [T] m is 
nonsingular. Then the identities [T((p - (p m )]m = and [<p m - E m ip] m = [T]^[TE^(p]rn 
hold true. We denote Q m := [T] m - [T} m and V m := [g]rn - [T]rnVPm]m = n' 1 Y17=i( u i + 
<p{Zi) — (p m {Zi))[f (Wi)]m, where obviously E V m = 0. Moreover, let us introduce the events 



n m :={\\[T]^\\ ^ v^}, U m := {V^\\Q m \\\\[T]^\\ ^ 1/2} 

«m ■= {HI?]" 1 !! > Vn} and U c m = {Vrn\\Q m \\\\[T]^\\ > 1/2}. 
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Observe that if \fm\\Q m \\ \\ [T]^ || ^ 1/2 then the identity [T]m = [^ 1 ]m{^+[? 1 ] m 1 Qm} implies 
by the usual Neumann series arg ument that \\[T\^\\ < 2||[T]- 1 ||. Thereby, if s/n ^ 2|| [T"]^ 1 1| 
we have I3 m C £l m . These results will be used below without further reference. We shall 
prove at the end of this section four technical Lemmata (A.l - A. 4) which are used in the 
following proofs. Furthermore, we will denote by C universal numerical constants and by 
C(-) constants depending only on the arguments. In both cases, the values of the constants 
may change from line to line. 

Proof of the consistency. 

Proof of Proposition 3.1. Consider for all m ^ 1 the decomposition 

E \i m - 4(^)| 2 = E \£ m - U<p)\ 2 tn m +\U<P)\ 2 P(V c m ) 

^ 2E \£ m - l h (<p m )\ 2 ln m +2\£ h (<p m - <^)| 2 + \£ h (<p)\ 2 P(n c m ) (A.5) 

where we bound each term separately. Let 13 m := {||Qm|| II Plm 1 || ^1/2} and let U c m denote 
its complement. By employing llfT]^ 1 !! 1^ ^ 2||[T]„ 1 || and || [f]^ 1 1| 2 tn m ^ n it follows 
that 

\£ m - 4(^ m )| 2 In™ < 2 I [hUT]-^ 2 + 2| [h] f m [T] ~ 1 Q m [T] ^V m \ 2 ln m (% m + % c ) 
< 2|[^]^[r]^ 1 V^| 2 +2||[^]^[r] m 1 || 2 {4||m^ 1 || 2 ||Q m || 2 ||^|| 2 + ^||Q m || 2 ||^|| 2 1 ^}. 

Thus, from estimate (A. 9), (A. 10), and (A. 11) in Lemma A.l we infer 

E \£ m - £h^m)\ 2 tn m < C(7)«- 1 ||N^[T]- 1 || 2 77 4 ( ( t 2 + ||p - tfmW 2 ) 

x {l + ^\\[T}^\\ 2 + m 3 P 1 /\lf m )}- (A.6) 

Let m = m n satisfying m" 1 = o(l), m n = o(n), and condition (3.3). We have \fn ^ 
2||[T]~^|| and thus, &>m n C i3 mn for n sufficiently large. From Lemma A. 3 it follows that 

m l ^P(lf mn ) < 2exp { - m n (32?7 2 n~ 1 m„ 3 || [T]~^ || 2 ) -1 + 141ogm n } = 0(1) as n — > oo since 
m n (4n~ 1 mf l \\ [T]~* || 2 ) -1 ^ 4?7 2 n for n sufficiently large. Thus, in particular P(fi£^J = o(l). 
Consequently, as n ^ oo we obtain E \£ mn - £h(fm n )\ 2 tn mn = o(l) since || [/i]^[T]^|| 2 = 
o(n). Moreover, as n — > oo it holds \£h(fm n ) ~ £h(f)\ 2 ^ IHIi/ 7 ll¥? - VmJb = due 
to condition (3.2), and \£ h (Lp)\ 2 P (Q. c mn ) < ||h||i/ 7 |M| 7 -P(fi™ n ) = o(l). This together with 
decomposition (A.5) proves the result. □ 

Proof of Corollary 3.2. The assertion follows directly from Proposition 3.1, it only 
remains to check conditions (3.2) and (3.3). We make use of decomposition \\(p — </2 m || 7 ^ 
H^m^Hi + \\E m (f — tfrnW-y- As in the proof of Lemma A. 2 we conclude \\E m ip — f m \\ 2 ^ 
\\ E m^\U su Pm su V\\^=i\\ T m lF rnTE^(p\\^ < Dd\\E^(p\\^. By using Lebesgue's dominated 
convergence theorem we observe ||-E^</?|| 7 = o(l) as m — > oo and hence (3.2) holds. Condi- 
tion T e T£ D implies || [hYjT]^f < \h\)vf x and ||[T]^|| 2 < Dv7 n } for all m ^ 1 

since v is nonincreasing. Thereby, condition (3.5) implies condition (3.3), which completes 
the proof. □ 
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Proof of the upper bound. 

PROOF of Theorem 3.3. The proof is based on inequality (A. 5). Applying estimate 
(A.14) in Lemma A.2 gives \l h (<p m - p)| 2 < 2p{J2 J>m [h] 2 1 ^ + Ddv mlm l E^i^K^ 
for all ip G TSf and h G J 7 !/^,. Since ^(y?)! 2 ^ [MI^||^[li/ 7 and \\(p\\ 2 ^ p we conclude 

E|? m -4(¥>)| 2 < 2E|4-4^m)| 2 ln m 

+ 4 4 E whi 1 + ^ XXV} + p\\h\\\ h m c m ). (a.7) 

j>m 3=1 

By employing UQmPlm*!! 2 lu m ^ m_1 and llt^lm 1 !! 2 l« m ^ n it follows that 

|L-4(^m)| 2 ln m < 2|[/ l ]yTLV m | 2 + 2m- 1 ||[/ l ]yTL 1 || 2 ||T/ m || 2 

+ 2n||[/ l ]yTL 1 || 2 ||g m || 2 ||K l || 2 l t5 c i . 

Due to T G 7% D and <p G ^ we have || [/i]^[T]^ || 2 sC ^EJ'liMjM and ||p - ^ m || 2 < 
2p(l + Z)<i) (cf. (A. 13) in Lemma A.2), respectively. Thereby, similarly to the proof of 
Proposition 3.1 we get 

m 

E \£ m - 4(^ m )| 2 ln„ < C( 7 )D( C r 2 + tfdDp)^ 1 E^Vj 1 + ^ 3 ^(^) 1/4 }- 

J'=l 

Combining the last estimate with (A.7) yields 

E|L-4M| 2 < C( 7 )£)(a 2 + r 7 2 d J Dp)max{ ^ [/i^ 1 , max (— , n" 1 ) E^Vi 

x {l + m 3 P(0^)V4} + m i hPisl c m y (A8) 
Consider now the optimal choice m = m* defined in (2.3), then we have 
E |4** - 4(V)| 2 < C(j)D{a 2 + p^dD + \\h\\ 2 lh ) )n h n 

x {l + (O^CU^.) 1 /* + (T^rM^n*)} 
and hence, the assertion follows by making use of Lemma A. 4. □ 



Technical assertions. 

The following paragraph gathers technical results used in the proofs of Section 3. Below we 
consider the set S m := {set™: ||s|| = 1}. 

Lemma A.l. Suppose that Pu\w ^ ^o- an d that the joint distribution of (Z,W) satisfies 
Assumption 3. If in addition <p G with 7 satisfying Assumption 1, then for all m 1 
we have 

sup E|s*F m | 2 < 2n- 1 ( CT 2 + C( 7 )7 ? 2 ||^-^ m || 2 ), (A.9) 

sG§ m 

E||y m || 4 < C( 7 ) (n-'m^d 2 + ||p - p m || 2 )) 2 , (A.10) 
E||Q m || 8 < C^rrWV) 4 - (A.ll) 
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PROOF. Proof of (A.9). Since ({Ui + <p(Zi) - (fm(Zi)}Y^JLi s jfj{Wi)), 1 < i < n, are 
iid. with mean zero we have E|s*F m | 2 = n _1 E|{J7 + <p(Z) - ^ m (Z)}Y^'JLiSjfj{W)\ 2 . 
Then (A.9) follows from Eff/ 2 ^] < (E[C/ 4 | W\) 1 / 2 < a 2 and from Assumption 3 (i), i.e., 
supj gN E[e 2 (Z)|iy] ^ rf. Indeed, applying condition |i| 3 7j~ = o(l) (cf. Assumption 1) 

gives ij 1 < and thus i 

m oo m 

E \{<f(Z) - fm(Z)} ^2 s jfj( w )\ 2 ^ IIV 9 ~ fmWy^^lf 1 E |e;(Z) ^ Sj/j(W)| 2 
i=l Z=l j=l 

m 

«S C(7)t ? 2 ||<^ - V9 m || 2 ^s 2 = C(7)t ? 2 ||v7 - V9 m || 2 . 
i=i 

Proof of (A.10). Observe that for each 1 < j < m, ({t/i + </>(Z) - ^mCW/jCWi)), 
1 ^ i ^ n, are iid. with mean zero. It follows from Theorem 2.10 in Petrov [1995] that 
E||F m || 4 ^ CrT 2 m 2 sup ieN E|{?7 + <p(Z) - ym(^)}/i(W")| 4 . Thereby, (A.10) follows from 
E[?7 4 |iy] < a 4 and sup jeN E[ff(W)] < rf together with E|{<^(Z) - ip m (Z)}fj(W)\ 4 < 
C (t) 7 7 4 II V 7 ~~ Vmll 4 ) which can be realized as follows. Since [T(<p — (p m )]j = we have 
{(p(Z) - <Pm{Z)}fj(W) = Yli>i[<P ~ t fm]i{ei{Z)f j (W) - {T] jyl }. Furthermore, Assumption 
3 (ii), i.e., sup^ 6N EMZ)/,(M0 - [Ty 4 < 4!r/ 4 , implies 

E|{^(Z)-^ m (Z)}/,-(W)| 4 < ||^-^ m || 4 E|^ 7 - 1 |e z (Z)/ J (iy)-[T] i ,| 2 | 2 

< C(7)r/ 4 ||99 - 99 m || 4 . 

Proof of (A. 11). The random variables (ei(Zi) fj (Wi) — [T]j t i), 1 ^ i ^ n, are iid. with 
mean zero for each 1 ^ j, I ^ m. Hence, Theorem 2.10 in Petrov [1995] implies E||Q m || 8 
Cn rrfi sup,- jgpj E |e^(Z)/j(W) — P'ljvl^ and thus, the assertion follows from Assumption 3 



(ii), which completes the proof. □ 
Lemma A. 2. IfTG T¥ D and ip £ J-^, then for all m ^ 1 we have 

\\E m y-p m \\ 2 ^Ddp, (A.12) 

\\<p-<p m \\* < 2(1+ Dd)p, (A.13) 

\(h,p- i p m ) z \ 2 ^2p VS|2%^V4 (A.14) 



PROOF. Consider (A.12). Since T G the identity [£ m <p - p^rn = -[T]^[TE^ip]rn 
implies \\E m ip - p m \\l ^ Z?||T£ , ^^|| 2 y ^ Dd\\E^p\\l. Consequently, 

\\E m (p - ip m \\l ^ Dd^v m \\p\\ 2 (A.15) 

because {lj is nonincreasing and thus, \\E m (p — p m \\ 2 ^ 7m , y m 1 \\E m <p — <p m \\v- By 

combination of the last estimate and (A.15) we obtain the assertion (A.12). By employing 
the decomposition \\tp — p m \\ 2 ^ 2\\tp — E m p\\ 2 + 2\\E m <p — p m \\ 2 the bound (A.13) follows 
from (A.12) and \\(p — E m p\\ 2 ^ IMIE- It remains to show (A.14). Applying the Cauchy- 
Schwarz inequality gives \(h,tp- E m p) z \ 2 ^ \\(p\\^ zZ^mMj^ 1 and \( h ,E m p - ip m )z\ 2 < 
D d \\<p\\ 2 Vmj^ 1 Y^=xW\j v 7 ^ (A-15). Thereby (A.14) follows from the inequality \(h,<p— 
<Pm)z\ 2 < 2|(/i, (/3 - E m ip) z \ 2 + 2\{h,E m ip - (p m )z\ 2 , which completes the proof. □ 
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Lemma A. 3. Suppose that the joint distribution of (Z, W) satisfies Assumption 3. Then 
for all n ^ 1 and m ^ 1 we have 

P(m- 2 n\\Q m \\ 2 ^ t) < 2exp(- ^ + 21ogm) for all < t < 4n 2 n. (A.16) 

Proof. Our proof starts with the observation that for all j, I G N the condition (ii) in 
Assumption 3 implies for all t > 



Pd^e^^/KW^-E^^/KW)]}! >t) ^2exp 



-t 2 



Ann 2 + 2nt 



i=i 

which is just Bernstein's inequality (cf. Bosq [1998]). This implies for all < t ^ 2nn 

n ±2 

supP(|V{ ei (ZO/KWi)-E[ ei (Z)/KW)]}|^t) <2ejq)(-— 2-). (A.17) 

It is well-known that m _1 || [A] m || ^ maxi^^ m |[.A] 3 -j| for any m x m matrix [A] m . Com- 
bining the last estimate and (A.17) we obtain for all < t ^ 2nn 1 / 2 

m n 

P(m-V/ 2 ||Q m || >t)^Y, P {\Yl {^(Zi)fi(Wi) -E[ ej (Z)fi(W)])\ > n^H 

3,1=1 i=l 

t 2 

^ 2 exp ( - — r + 2 log m) . 

□ 

Lemma A. 4. Under the conditions of Theorem 3.3 we have for all n ^ 1 

(m* n ) 12 P(U c m ,J^C^,v,n,D) (A.18) 
(7^)^(0^) < C( 7) v, n, h, D). (A.19) 

Proof. Proof of (A.18). Since HIT]" 1 !! 2 < Dv^ 1 due to T G 72p it follows from Lemma 
A. 3 for all m, n 1 that 

P(^) < p(»-n||Q mP > ^) < 2exp ( - ^3 + 2 logm ) 

since (4Z)m 3 u~ 1 ) -1 ^ 1 ^ An 2 for all m ^ 1. Due to condition (3.6) there exists no ^ 1 
such that nv m * n ^ 448-Dry 2 (m*) 3 logm* for all n ^ n . Consequently, (m* ) 12 P(U^* ) ^ 2 
for all n no, while trivially (m* ) 12 P(vJ^* ) ^ (^no) 12 ^ or an n ^ n o, which gives (A.18) 
since no and m* depend on 7, v, n and D only. 

Consider (A.19). Let n G N such that max{| log7^|, (logm* )}(m*) 3 ^ nv m * n (96D7? 2 ) -1 
for all n ^ no. Observe that U m C S7 m if n ^ 4.Du~ 1 . Since (m*)" 3 n%« ^ 96Dn 2 for all 
n ^ n it follows m; m . > AD for all n > n and hence (ft^) -1 -?^. ) < (^) _1 P(U^.) < 2 
for all n ^ no as in the proof of (A.18). Combining the last estimate and the elementary 
inequality (7^)-^^*) < (U^)' 1 for all n < n shows (A.19) since no depends on 7, <u, 
n, /i and D only, which completes the proof. □ 
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A. 3. Proofs of Section 3.4 

PROOF of Proposition 3.5. Proof of (pp). From the definition of m* in (2.3) it follows 
m* ~ n 1 /( 2 P+ 2ct ). Consider case (i). The condition s — a < 1/2 implies n -1 Y%=i li| 2 °~ 2s ~ 
n- l {m* n ) 2a - 2s+1 ~ n -(2 P +2 S -i)/(2p+2a) and moreover) J2 j>m * n \j\~ 2p ~ 2s ~ n-( 2 f+ 2s - 1 )/( 2 P+ 2a ) 
since p + s > 1/2. If s - a = 1/2 then n' 1 Y^ti\j\ 2a ~ 2s ~ n" 1 log(n 1 /( 2 P+ 2a )) and 
Ej>m* lir 2p_2s ~ In th e case of s - a > 1/2 it follows that Y^i \j? a ~~ 2s is bounded 
whereas Ylj> m * lil~ 2p_2s ~ n 1 an d hence, 72.^ ~ n _1 . To prove (ii) we make use of Corol- 
lary 2.2. We observe that if s — a ^ the sequence ojv is bounded from below, and hence 
IZ^ ~ n _1 . Otherwise, the condition s — a < implies IZ^ ~ n -(p+ s )/(p+ a ). 
Proof of (pe). Note that m* satisfies m* ~ log(n(logn) _p / a ) 1 /( 2a ). In order to prove (i), 
we calculate that £ i>m * \j\~ 2p - 2s ~ (l og n)(- 2 P- 2s+1 )/( 2a ) and n" 1 E™^ exp(| j| 2a )| j|- 2s < 
(logn)(- 2 P- 2s+1 )/( 2a ). In case (ii) we immediately obtain K% ~ (log n)~(P +s )/ a . 
Proof of (ep). It holds true m* ~ log(n(logn) _a / p ) 1 /( 2p ^. Consider case (i). If s — 
a < 1/2 then n" 1 ^ |j'| 2a_2s ~ n _1 (logn)( 2a - 2s+1 )/( 2 P). If s - a = 1/2 we conclude 
n~ l \ j\ 2a ~ 2s ~ n l°g(log( n ))- On the other hand, the condition s — a > 1/2 implies 

that X^j=i lil ~ 2s is bounded and thus, we obtain the parametric rate re -1 . Moreover, it 

is easily seen that J2j> m * \ j\~ 2s ex P( — \j\ 2p ) ^ n ~ l £j=i |j| 2a_2s - I n case (ii) if s — a ^ 
then the sequence ujv is bounded from below as mentioned above and thus, ~ n -1 . If 
s — a < then ~ n~ 1 (logn)( a ~ s ^ p , which completes the proof. □ 



A. 4. Proofs of Section 4 

At the end of this section we shall prove six technical Lemmata (A. 7 - A. 12) which are 
used in the following proofs. Let us introduce a nondecreasing sequence A := (A m ) TO ^i 
and its empirical analogon A := (A m ) m>1 by A m := maxi^ m /^ m || [/t]^,[T]~}|| 2 and A m := 

max ls g m /;g m || [/i]^, [T]~}|| 2 , respectively. Similarly to M+ introduced in (4.5) we define 

M~ := min i 2 < m < M% : ADv^m 3 max [h] 2 > a n \ - 1 (A.20) 

[ l^jf^m J J 

where we set M~ := if the 

set is empty. Thus, M n takes values between 1 and M%. In 
the following C > denotes a constant only depending on the classes J 7 ^, T^ D , the constants 
a, r] and the representer h. For ease of notation, the value of C > may change from line 
to line. 

Proof of Theorem 4.1. The proof of the theorem is based on inequality (4.2). Observe 
that by Lemma A. 10 we have M~ ^ M n ^ M+. Due to condition (m°) 3 maxi^^ m o [h] 2 = 
o(a n i; m o) as n — > oo there exists no ^ 1 only depending on /i, 7, and v such that for all 
n ^ no it holds m° ^ -^T- We distinguish in the following the cases n ^ no and n < uq. 
First, consider n ^ hq. Applying Corollary A. 6 together with estimate (4.2) implies 

- 4(^)| 2 < c|pen m o +bias m o +n _1 |. 

From the definition of pen m we infer pen m $C 24(3/9+2cr 2 )(l-|-logn)n _1 .D Y^=i\!tfj v J since 
T G 7^^), ?7 G ', and G F p . Moreover, since ip G and Ii £ estimate (A. 14) in 
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Lemma A. 2 implies for all 1 < m ^ M n that bias m ^ mm i^ m '^M- 2p{ ^j>m'M 2 7j 1 + 
Consequently, 

E|4 " 4(¥>)| 2 < C{ max ( £ MfV, <YMvJ x ) + n" 1 }. 

Consider now n < uq. Observe that for all 1 ^ m ^ M% it holds 

\i m - 4MI 2 < 2\[h] t J l f]^V m \ 2 ln m +2(|4(Vm - 9?)| 2 + I4(^)| 2 Ing.) 

< 2n||[^|| 2 ||y M ,|| 2 + 2(\£ h (<p m - p)\ 2 + |4M| 2 InJ- (A.21) 

From the definition of M% we infer || [/i]jyh || 2 ^ [h] 2 n 5//4 . Hence inequality (A. 10) in Lemma 
A.l, inequality (A. 13) in Lemma A. 2 and Lemma A. 12 yield for all ip S and h E J~i/-y 

nE |4 - £ h (cp)\ 2 < 2 [ft] 2 n 9 / 5 ||F Mnh || 2 + 6p||h||? /7 (l + £>d)n < C, 
which proves the result. □ 

Lemma A. 5. Consider (pen m ) m ^i withpen m := 24(24E[lf 2 ]+967/ 2 pm 3 7~ 1 )(l + logn)n~ 1 . 
Then under the conditions of Theorem 4-1 we have for all n ^ 1 

sup sup E max ( \£ m - 4(^m)| 2 - \ P~en m ) ^Cn' 1 . 
T ^Tl D Pu\ w <aU™ m°<m<M+ V / + 

Proof. Similarly to the proof of Theorem 3.3 we obtain the decomposition 

|L-4(^)| 2 < 2|[/ i ]yTLV m | 2 + 2m- 1 ||[/ l ]yTL 1 || 2 ||y m || 2 + 

2n\\[h]l\T}^Q m \\ 2 \\V m \\ 2 luc, +|4(^m)| 2 log, • 
Observe that ||[/t]m[^']m 1 || 2 ^ ^-m for all m ^ 1 and hence, we have for all m° ^ m ^ 

i? , / m2 1 — - \ /OA /IWrnl^m 1 ^™! 2 pen. 
Km -4(^)1 ~^pen m < 2A 



+ 2A m ( _ P^n ) + 2n^ m \\Qmf\\Vmf lug, +I4(^m)| a Iflg 



^mll 2 _ pen, 
m 24 A... 

— • -*m ~\~ Hm Him T\m- 

Consider the first two right hand side terms. We calculate 

— M+ 

E max (J m + < 4 max sup E (\s l V m \ 2 - V A m . 

m^ra^M+ m° ^ms£M+ seS m v 24ZA m / + ' 

From the definition of pen we infer for all s £ S m and m° ^ m ^ M+ 



nE (| S V m | 2 - f^) + *S 2E ((n-^YtUisVmim) 2 ~ 12E[[/ 2 ](1 + logn) 



i=l 



+ 2E((n- 1 / 2 ^((^(Z,)-^ m (Z i )) S '[/(^)U 2 -48? ? 2 pm 3 7^ 1 (l + logn) 



i=l 



< C{a, 77, 7, p,I?)n 1 
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where the last inequality follows from Lemma A. 7 and A. 8. Due to the definition of M+ and 
since A is nondecreasing we have n~ l X^m=i ^™ ^ £'(m; jW +)" 1 (Af+) 2 max 1< ^ <M +[/i]^ ^ 
AD 2 . Consequently, Emax 0< m<M + (I m + Hm) ^ CrT x . Further, we obtain for ip G Pj 



and ft £ J 1( / 7 



E max (JJJ m ) < nA M + (E ||Q + f )V*(R \\ V M + || 4 )V 2 P 1/4 [j U 



M 



+ 



C( 7 )r, 4 (a 2 + (1 + J Dd)p)n- 1 A Mn+ (M+) 3 P 1 /4 (J y 



m=l 



where the last inequality is due to Lemma A.l and 

M+ 



E max (IV m ) ^ p\\h\\\ h P[ M ft 



2 C 



Now applying n 1 A M + (M+) 3 < 4D 2 and Lemma A. 9 gives E max m0<m ^ M + (II I m + iV m ) ^ 
Cn _1 , which completes the proof. □ 

Corollary A. 6. Under the conditions of Theorem ^.1 we have for all n ^ 1 

sup sup E max ( \£ m - £ h (<^m)\ 2 - ~ pen m J ^Cn' 1 . 

T ^l D Pu\w£U2° m°<m<M+ V / + 

Proof. Observe that m 3 7~ 1 = o(l) and — (fmWz = o(l) as m — > oo due to Assumption 1 
and T G 77r) (cf. proof of Corollary 3.2), respectively. Thereby, there exists a constant no 
only depending on 7, p, and 7/ such that for all n no and m ^ m° we have 

24E[C/ 2 ]+967 ? 2 /0 m 3 7m 1 < 72 ( E[Y 2 } + ||p m ||| + [|y>- <p m \\ 2 z ) + 96r ] 2 pm 3 lm 1 ^ d- (A.22) 

We distinguish in the following the cases n < no and n ^ uq. First, consider n < uq. Due 
to n~ x Y2m=i ^ 4D 2 and inequality (A. 9) in Lemma A.l we calculate for all s G S m 

M+ M+ 

A m E(j S V m | 2 -g^) + < ^ A m E|sV m | 2 < 8n D 2 (o- 2 + C( 1 )7 ] 2 \\tp-tp m \\ 2 )n- 1 . 

m=l 171 m=l 

Therefore, following line by line the proof of Lemma A. 5 it is easily seen that it holds 
nEmax m o <m<M + (\£ m -£h(^m)\ 2 -l pen m ) + < C. Consider now n ^ n . Inequality (A.22) 

implies pen m < pen m and thus, (\£ m - £h(f m )\ 2 - g pen m ) + ^ [\£ m - £ h (f m )\ 2 - g pen m ) + 

for all m° ^ m ^ M+. Thus, from Lemma A. 5 we infer nEmax m0<m<M + (\£ m — £h{ i Pm)\ 2 — 

g pen m ) ^ C, which completes the proof of the corollary. 

□ 

Proof of Theorem 4.2. Similarly to the proof of Theorem 4.1 and since pen is a non- 
decreasing sequence we have for all 1 ^ m ^ M n 



\£fh-£h(v)\ < pen m + bias m + max_ ( \£ m , - 4(¥>m')l --pen. 



in' 
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Let us introduce the set 



A := {pen m < pen m < 8pen m , 1 < m < M+} n {M~ < M n < M+}, 
then we conclude for all 1 ^ m ^ M~ 

|4i - 4 Ml 2 1.4 < pen m + bias m + max ( |? m / - 4(y?m')| 2 ~ \ P en r 

Thereby, similarly as in the proof of Theorem 4.1 we obtain for all (p £ J-^ and h £ 
the upper bound for all n 1 

Wfn ~ 4MI 2 U ^ ^(l+logn)-*" (A.23) 

Let us now evaluate the risk of the adaptive estimator on A c . From the definition of 
M% we infer ||[/i]m^I| 2 ^ Wi n ^n- Consequently, inequality (A. 21) together with (A. 10) 
in Lemma A.l, (A. 13) in Lemma A. 2 and Lemma A. 12 yields for all (p £ and h £ J'i/j 

E |4-4M| 2 1^2 [/a] 2 n 2 M^^ 

The result follows by combining the last inequality with (A.23). □ 
Technical assertions. 

The following paragraph gathers technical results used in the proofs of Section 4. In the 
following we denote £, s (w) := Y^jLi s jfj( w ) w here s £ S m = {s £ W" 1 : ||s|| = 1}. 

Lemma A. 7. Let Assumptions 3 and 4 hold. Then for all n ^ 1 and 1 ^ m ^ [ nl ^ 4 J we 
have 



sup sup E 



1 n 2 

-12E[[/ 2 ](l + logn))J ^CCff.^n- 1 . 



Proof. Let us denote 5 = 12E[?7 2 ](1 + log n). Since the error term U satisfies Cramer's 
condition we may apply Bernstein's inequality and since ElC/ 2 )!^] ^ a 2 we have 



E 



1 n 2 
(- Z)^-(Wi) ~ 6 ) iWi,...,^] 

n i=i + 

poo n 

= J P {J2 U MWi) > y/n(t + 8)\W 1} ...,W n )dt 



oo 



, / -n(t + 6) \ , f°° / -J nit + 8) . 4 . 



o 



Consider the first summand of (A. 24). Let us introduce the set 
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where 8ji = 1 if j = I and zero otherwise. Applying Cauchy-Schwarz's inequality twice we 
observe on B for all n > 1 and 1 ^ m < M+ 



n 



n m n 



i=i 



i=l 



since n -1 / 4 logn «S 3/2 for all n > 1. Thereby, it holds n' 1 l6(Wi)| 2 1 B < 3/2 and 
thus, 



raE 



cxp 



-n(t + <5) 



^ 2 Er=ii6(^)i 2 



(it 1b 



^ 12a exp ( log n 



12a 2 



6a 2 . 



(A.25) 



On the complement of B observe that sup 3 -j Y&r(fj(W)fi(W)) < rj 2 due that Assumption 3 
(i) and thus, Assumption 4 together with Bernstein's inequality yields 



P(B C ) ^ £ P(3| Y,fi( W 0fi(Wi) - S Ji\ > v^log 

j,l=l i=l 

n(logn) 2 



2m exp 



36nr/ 4 + 6^-^/nlogn 



2 exp ( 2 log m 



(logn) 
42r/ 4 



By Assumption 3 (i) it holds E \£ S {W)\ 4 < E | YJLi ff(W)\ 2 < m 2 r/ 4 . Thereby 



raE 



exp 



-n(t + 5) 



8* 2 U=i\UW t )\ 2 



dt Ige 



< 8a 2 n(E|6(W r i)| 4 P(i3 c )) 1/2 < 12a 2 r/ 2 (A.26) 



for all n ^ exp(126r? 4 ) and 1 < m < L™ 1/4 J • For re < exp(126r/ 4 ) it holds reE[|£ s (l^i)| 2 l B c] < 
exp(126n 4 ). Consider the second summand of (A. 24). Since exp(— 1/x), x > 0, is a concave 
function and E |£s(W)| 4 ^ m 2 rf we deduce for all 1 ^ m ^ [ nl ^ 4 J 



E 



exp 



^ / exp 



4cr max^^n |£ S (W, 
-Vn(t + *) 



"00 

^ / exp 

o 



-y/n(t + 5) 



4a(nE|^(VF)| 4 ) 1 / 4 



ITi / ex P 



4crEmax l4 c; 4 c n \£s(Wi. 



dt 



4:a rjyfm 



dt 



1/4 A? 

< 8CT77v / Wnexp — =■ V™ 1/4 ^ + 4a nv/m) < C(f7,77)n _1 . (A.27) 
V Aar/^m / \ / 



The assertion follows now by combining inequality (A. 24) with (A.25), (A.26), and (A.27). 

□ 



Lemma A. 8. Lei Assumptions 1 and 3 hold. Then for all n ^ 1 and 1 ice /lave 

2 ~3 

-48?? 2 p 

7n 



sup sup E 



2 3 

V^^)-^^)^^) -487? 2 p— (1+logra)) < C( V , 7 , p, Z^n" 1 . 



i=l 
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Proof. Let us consider a sequence w := {wj)j^\ with Wj := j 2 . Since [T(</2 — (p m )}rn = we 
conclude for m ^ 1, s G § m , and fc = 2, 3, . . . that 

oo m 

E Kvp(Z) - ^ m (Z))6(W)| fc = E I J> - ^ *jW)fj<W) - \T\ 3l )\ k 

i=\ j=i 

oo m 

z\\<p- vmiiiE i ^ ^r 1 X>(s)/;(w) - p^)T /2 

i=l 3=1 

< || V - ^ m ||^m fc / 2 (W6) fc sup E \ei(Z)fj(W) - [T] 3l \ k 

j,im 

where due to Assumption 3 (i) sup 3 y eN Y&i (ei(Z)fj(W)) ^ r\ 2 and due to Assumption 3 (ii) 
it holds sup„-^ gN E \ei(Z)fj(W) — [T]ji\ ^ k\r] k for k ^ 3. Moreover, similarly to the proof 
of (A.13) in Lemma A. 2 we conclude m k l 2 \\ip — ^p m \\% < (m 3 7~ 1 ) fc / 2 (2 + 2Dd) k / 2 p k / 2 . Let 
us denote p m := rj (1 + Dd)\J 6p m 3 7m 1 . Consequently, for all m ^ 1 we have E — 
V? m (Z))6(W0| 2 < /4 and 

sup E |(p(Z) - 99 m (Z))6(^)| fc < P k m kl for A; = 3,4, ... . (A.28) 
ses m 

Now Bernstein's inequality gives for all m 1 

1 I n 2 
~ E^( Z *) " - 8/&(l + logn; 



r / I I , 

sup E 

sGS m 



1=1 

oo 



<2/ exp (z|^)) dt+2 r exp (zv^±?y )(i( 

Jo v 8/4 / J V 4// m y 
< 16/4 ex P(- lo g n ) + 16/i m n" 1/2 exp ( v /n ( 1 + 1 °g ?I ) j + ^np^Jl + logn)) 

< C{w,p,D)n- x 

and thus, the assertion follows. 



□ 



Lemma A. 9. let T € T/ien /or all n ^ 1 i£ ZioWs 



^( (J °m) ^C(M,*7,£>)n- 4 , (A.29) 



m=l 
M+ 



P( |J JT m ) ^ C(/», u, 7?, £>) n- 1 . (A.30) 



m=l 



Proof. Proof of (A.29). Since T E T^ D we have 1 1 [T 1 ] m x 1 1 2 ^ Dv rr } and thus, exploiting 
Lemma A. 3 together with the definition of gives 



M 

n 



4jP ( U »m) ^ 2ex p(- 48 ^ ( M+f3 +3l0gM " ++41 ° gre ) < C ( h > v >V,D)- 



m=l 



n 
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Proof of (A. 30). Due to the definition of M+ there exists some no ^ 1 such that n ^ ADv^ 



for all n ^ no- Thereby, condition T £ T^ D implies max 1<m<M + (([TJ^H 2 ^ Dv^ + ^ n/4 

for all n ^ no- This gives (Jm=i c Um=l anc ^ inequality (A. 30) follows by making 
use of (A. 29). If n < no then nP((Jm'™ 1 ^m) ^ n o an d the assertion follows since no only 
depends on h, v and D. □ 

Lemma A. 10. Let T e 7^' D . T/ien it ZioWs M~ < M n < M+ /or c/Z n ^ 1. 

Proof. Consider M~ < M n . If M~ = 1 or M n = M% the result is trivial. If M n = 1, 
then clearly M~ = 1. It remains to consider M~ > 1 and M% > M n > 1. Due to T G ^ 
it holds || [T]^ 1 +1 ||~ 2 ^ ~ lv M n +i an d thus, by the definition of M n and M~ it is easily 
seen that 

V M~ 4t; M „+l 



max [hfAMnY max [/t]f(M n + l) 3 ' 

and thus, M n + 1 > M~ , i.e. M n ^ M~. Consider M n sC M+. If M n = 1 or M+ = 

the result is trivial, while otherwise since t;" 1 < || [T]^ 1 1| 2 sup|| Em 0|| i;=1 
DlIlT]" 1 !! 2 due to condition T £ with d ^ D and by the definition of M n and M+ it 
follows 

vm„ ^ Av m++i 



> 



max [/i] 2 M 3 max [/i] 2 (M+ + l) 3 ' 

Thus, M+ + 1 > M n , i.e. ^ M n , which completes the proof. □ 

In the following, we make use of the notation ay '■= E[Y 2 ] and a Y '■= n _1 Y22=l ■ Further, 
let us introduce the events 

^:={||Q m ||||[T]- 1 Hl/4 Vl<m<(M+ + l)}, (A.31) 
Q ■= [<j\ < 2of < 3CT 2 ,}, (A.32) 

J ■■= {\\[T]^V m f < gfllCn-^UI 2 + 4) VI < m < M+|. (A.33) 
Lemma A.ll. Let T £ T£ D . Then it holds TinG Dj C A. 

Proof. For all 1 ^ m ^ M+ observe that condition || [T]" 1 1| $C 1/4 yields by the usual 

Neumann series argument that ||([-f]m + QmP~']~ 1 ) -1 — [I]m\\ ^ 1/3- Thus, using the identity 
[TV = [TU - [T]^{([I]rn + QmlT)^)- 1 - [/]„) we conclude 

Similarly, we have 2\\[T]^v m \\ < 3||[%> m || < 4||[T]„> m || for all v m G M m . Thereby, 
since [T]^ 1 !^ = [I^^m - [T]^ 1 ^]™ we conclude 

llPttkf < (32/9)||[T] m 1 y m || 2 + 2||[f] m 1 [?U|| 2 , 



\{T] m l [gU\ 2 < (32/9)||[T]- 1 y m || 2 + 2||[T] m 1 [ 5 ] 



|2 
ml • 
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On J it holds || [TJ^V^H 2 < |(|| [^y^y] 2 + 4)- Thereby, the last two inequalities imply 

^/m\[T]^\g} m f + ^)^^ + 2\\[f}^[g} m f, 
\\[T&U\ 2 *S (22/9)||[T]- 1 [ 5 y| 2 + (4/9)4- 

On Q it holds 4 ^ 2o\ ^ 34 which gives 

(5/9)(||[T]- 1 [ 3 y| 2 + 4) < (3/2)4 + 2||[f]- 1 [5k|| 2 ! 
\\[T&U\ 2 + °Y < (22/9)||[T]- 1 [ ff y|| 2 + (10/9)4. 

Combing the last two inequalities we conclude for all 1 ^ m ^ M 

(5/i8)(||[TL 1 byi 2 + 4) < || [ fL 1 [5yi 2 + 4 < (22/ 9 )(||[ry^yi 2 + 4). 

Consequently, we have 

nngnjc {4A m ^ 9A m ^ 16 A m and s: 18^ s: 44 ^ VI sC m sC M+} 

and thus, H n Q n J C {pen m ^ peh m ^ 18pen m VI ^ m ^ Moreover, it holds 

% C \M~ ^ M n ^ which can be seen as follows. Consider {M n < M~}. In case of 

M n = M% or M~ = 1 clearly {M n < M~} = 0. Otherwise by the definition of M n it holds 



A/--1 

{M n < M-} = [J {M n = m} C {32 sC m < M~ : m 3 !!^" 1 !! 2 m^f/i] 2 > a n }. 



m=l 



By the definition of M n and the property HPI^H 2 ^ Dv^ there exists 2 ^ m ^ M n such 
that on {M n < M~} it holds llfTy 1 ]] 2 > 4L*!;- 1 > 4 ||[Ty|| 2 and thereby, 

{M n < M-} C {32 < m < M- : ||[fy|| 2 ^ 4 || [T}^\\ 2 ). (A.34) 

Consider {M n > M+}. In case of and M n = M% oi M~ = I clearly {M n < M~} = 0. 
Otherwise, condition T G 7^ with d ^ D implies i; m 1 ^ D|| [T]" 1 1| 2 as seen in the proof of 
Lemma A. 9. Thereby, we conclude similarly as above 

{M n > M+} C {||[T]^y| 2 £ 4||[f]^|| 2 }. (A.35) 

Again applying the Neumann series argument we observe 

U C {Vl < m < (M+ + 1) : 2||[T]- 1 || < 3\\[f}^\\ ^ 4||[T]- 1 ||}, 

which combined with (A.34) and (A.35) yields {M~ < M n < M+} c C "H c and thus, 
completes the proof. □ 

Lemma A. 12. Under the conditions of Theorem 4-2 we have for all n ^ 1 
n A {M^fP{A c ) < C. 
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Proof. Due to Lemma A.ll it holds ra 4 (M^) 4 P(„4 c ) < n 4 (M%) 4 {P(H c ) + P{J C ) + P{Q C )}. 
Therefore, the assertion follows if the right hand side is bounded by a constant C, which we 
prove in the following. Consider T~L. From condition T £ T^ D and Lemma A. 3 we infer 

n 4 {M^) 4 P(U c ) < 2 exp ( - - J— Z+^L + 3 lo %( M n + l ) + 5 lo § n ) < Cfo u, r?, Z>) 



128D?? (M+ + l) 2 



(A.36) 



where the last inequality is due to condition (M+ + l) 2 logn = o(nv M + +1 ). Consider Q. 
Due to condition m 3 7~ 1 = o(l) asm^oo and {7 G Z^£° we observe E[Y k ] < 2 A: (E[c/?' c (Z)] + 
E[[/ fe ]) sC C(7,p,o-)sup J>1 E[e J fc (Z)]. Thereby, assumption swp^ E[ef(Z)} ^ r] 20 together 
with Theorem 2.10 in Petrov [1995] imply 

n 

n 4 (M^) 4 P(g c ) ^ n 5 P(\a Y -a Y \ > 4/2) sC 1024(j- 2 V E \n~ l £ Y 2 - a 2 Y | 10 

i=l 

s: 1024<j y 20 E|y 2 -a Y \ 10 ^ C(-f,p,a,n). (A.37) 



Consider J . For all m ^ 1 observe that the centered random variables (Yi — (p(Zi))fj(Wi), 
1 ^ i ^ n, satisfy Cramer's condition (A. 28) with /i m = r/ (1+L>cf) \/ 6p m 3 7m 1 ^ C(r/, 7, p, I?). 
From (A.13) in Lemma A.2, ip £ TS,, and Pjj\ w £ we infer ||^ m ||| + 4 ^ 4(2 + Def) / 9 + 
2<t 2 . Moreover, it holds || P""]" 1 ^!! 2 ^ Dv^ 1 \\ Vm\\ 2 by employing condition T € 7^£)- Now 
Bernstein's inequality yields for all 1 ^ m ^ 



n 6 



P(\\[T]^V m \\ 2 >(\\[T]^[gU\ 2 +a 2 Y )/8 



j=i i=i 



2 n 2 v m / 2 2 



2n m exp 



n 2 v m m 



'W\'Pm\\ 2 z + ^) 



32Dn/i 2 i + le^n^m-Vadl^m + ^1/2 
^ 2 exp 7 log n 



M+C(o-, V ,j,p,D) 

Due to the definition of M+ the last estimate implies n 4 {M^) 4 P{J c ) ^ C, which completes 
the proof. □ 
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